A C++ API and Simulator for Hardware.
Cash is a C++ embedded domain specific library (EDSL) for hardware design and simulation. It uses template metaprogramming and macro-based reflection to extend the C++ language with hardware specific constructs. Cash enables developpers to describe and simulate their hardware designs in a single source program, leveraging the large ecosystem of C++ development tools and libraries.
Cash requires a C++17 compiler to build and works best with clang 9 to leverage its custom plugin for code reflection.
Other dependencies include:
Install Build Essentials:
$ sudo apt-get install build-essential git cmake zlib1g-dev
Install IVerilog:
$ sudo apt-get install iverilog
Install LLVM 9 (Ubuntu 18.04 and above):
$ sudo apt-get install clang-9 libclang-9-dev
Install LLVM 9 (Ubuntu 16.04):
$ wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key|sudo apt-key add -
$ add-apt-repository "deb http://apt.llvm.org/xenial/ llvm-toolchain-xenial-9 main"
$ apt-get update
$ apt-get install clang-9 libclang-9-dev
To install Cash you must clone the repository and create a build directory:
$ git clone https://github.com/gtcasl/cash.git && cd cash
$ mkdir build && cd build
Then use run cmake to generate the makefile and export the package informations:
$ cmake ..
Build and install Cash on your system:
$ make -j`nproc` all
$ sudo make install
Test your build
$ make test
Alternative Installation using LIBJIT Compiler
Install LIBJIT dependencies:
$ sudo apt-get install libtool autoconf flex bison texinfo
Build and install LIBJIT:
$ git clone https://git.savannah.gnu.org/git/libjit.git
$ pushd libjit
$ ./bootstrap
$ mkdir build
$ pushd build
$ ../configure --with-pic
$ make -j`nproc` all
$ sudo make install
$ popd
$ popd
Build and install Cash using ‘JIT=LIBJIT’ configuration option:
$ mkdir build && cd build
$ cmake .. -DJIT=LIBJIT
$ make -j`nproc` all
$ sudo make install
$ mkdir demo
$ cd demo
$ cp /path_to_project/scripts/Makefile .
#include <cash/core.h>
#include <assert.h>
#include <iostream>
using namespace ch::core;
// Generic MAC module
template <uint I, uint O>
struct MAC {
__io (
__in (ch_bool) enable,
__in (ch_int<I>) a_in,
__in (ch_int<I>) b_in,
__out (ch_int<I>) a_out,
__out (ch_int<I>) b_out,
__out (ch_int<O>) c_out
);
void describe() {
auto sum = io.c_out + ch_mul<O>(io.a_in, io.b_in);
io.a_out = ch_nextEn(io.a_in, io.enable, 0);
io.b_out = ch_nextEn(io.b_in, io.enable, 0);
io.c_out = ch_nextEn(sum, io.enable, 0);
}
};
// Generic MatMul module
template <unsigned I, unsigned O, unsigned N, unsigned P, unsigned M>
struct MatMul {
__io (
__in (ch_bool) valid_in,
__in (ch_vec<ch_int<I>, N>) a_in,
__in (ch_vec<ch_int<I>, P>) b_in,
__out (ch_vec<ch_vec<ch_int<O>, P>, N>) c_out,
__out (ch_bool) valid_out
);
void describe() {
// systolic 2D array of MAC units
ch_vec<ch_vec<ch_module<MAC<I, O>>, P>, N> macs;
// a simple counter
ch_uint<log2up(N+P+M)> ctr;
ctr = ch_nextEn(ctr + 1, io.valid_in, 0);
// MAC array connections
for (unsigned r = 0; r < N; ++r) {
auto p = ch_delayEn(io.a_in[r], io.valid_in, r, 0);
for (unsigned c = 0; c < P; ++c) {
auto q = ch_delayEn(io.b_in[c], io.valid_in, c, 0);
macs[r][c].io.enable = io.valid_in;
macs[r][c].io.a_in = c ? macs[r][c-1].io.a_out.as_int() : p;
macs[r][c].io.b_in = r ? macs[r-1][c].io.b_out.as_int() : q;
io.c_out[r][c] = macs[r][c].io.c_out;
}
}
// output valid?
io.valid_out = ch_nextEn(ctr == N+P+M-1, io.valid_in, false);
}
};
static constexpr int InBits = 8;
static constexpr int OutBits = 24;
static constexpr int N = 2;
static constexpr int P = 3;
static constexpr int M = 4;
int main() {
// a=MxN, b=PxM, c=PxN
int a[N][M] = { { 0, 1, 2, 3 }, { 4, 5, 6, 7 } };
int b[M][P] = { { 0, 1, 2 }, { 3, 4, 5 }, { 6, 7, 8 }, { 9, 10, 11 } };
int c[N][P] = { { 42, 48, 54 }, { 114, 136, 158 } };
ch_device<MatMul<InBits, OutBits, N, P, M>> matmul;
ch_tracer tracer(matmul);
tracer.run([&](ch_tick t)->bool {
matmul.io.valid_in = true;
auto j = t / 2;
for (size_t i = 0; i < N; ++i) {
matmul.io.a_in[i] = (j < M) ? a[i][j] : 0;
}
for (size_t i = 0; i < P; ++i) {
matmul.io.b_in[i] = (j < M) ? b[j][i] : 0;
}
return !matmul.io.valid_out;
}, 2);
std::cout << "result = " << matmul.io.c_out << std::endl;
// Verify
for (size_t j = 0; j < N; ++j) {
for (size_t i = 0; i < P; ++i) {
assert(c[j][i] == matmul.io.c_out[j][i]);
}
}
ch_toVerilog("matmul.v", matmul);
tracer.toVCD("matmul.vcd");
return 0;
}
$ make
$ demo.out
The top-level description of a hardware block in Cash is a module, described using a C++ struct or class. A valid Cash module should define at least two properties:
Cash modules can be extended like any other C++ class using inheritance or polymorphism. Likewise, class methods or functions can also be defined to improve abstraction and code reuse.
Our QuickStart example above describes a ‘MatMul’ top module that consumes a 2D array of ‘MAC’ sub-modules. The ‘MatMul’ top module is instantiated in the host ‘main()’ routine using the ‘ch_device<T>’ transfrom. The ‘MAC’ sub-module are instantiated inside the top module using the ‘ch_module<T>’ transform.
Category | Description |
Primary Types | ch_bit, ch_int, ch_uint, ch_bool |
Literal Types | binary, octal, decimal, hexadecimal |
IO Types | __in, __out, __interface |
Sequential Types | ch_reg, ch_mem |
User-Defined Types | ch_vec, __enum, __struct, __union |
Component Types | ch_device, ch_module, ch_udf |
Extended Types | ch_fixed, ch_float, ch_complex |
The Primitive types are the main storage elements for computation in the language with ch_bit<N> representing a collection of consecutive bits. Other primarytypes, boolean (ch_bool), unsigned integer (ch_uint<N>) and signed integer (ch_int<N>) are derivative of ch_bit<N>. Extended types are implemented in the hardware template library as extension to the primiary types. Data types are configurable via C++ templates to specified the bit width of the object.
Cash extends C++ built-in literals with binary literals, octal literals and hexadecimal literals. The size of the literal can be specified explicitly or inferred automatically from the its value. The following code snippet shows the declaration of three literals a, b, and c.
auto a = 1010_b4; // 4-bit binary
auto b = 4040_o; // octal with size auto deduced!
auto c = 1080_h128; // 128-bit hexadecimal
Input/Output types in Cash are implemented using type specifiers to assign a direction of incoming and outgoing signals. __in(T) is used to define a input signal of data type T. __out(T) is used to define a output signal of data type T. The ports interface of a module is declared using __io () construct where all the inputs and outputs are defined. Inside the module’s describe() implementation, the ports interface is access via the io public class member. The following example illustrates the simple us of I/O types for a generic full adder. The io interface holds three inputs ports cin, lhs, and rhs, and two output ports out, and cout.
template <unsigned N>
struct Adder {
__io (
__in (ch_uint1) cin,
__in (ch_uint<N>) lhs,
__in (ch_uint<N>) rhs,
__out (ch_uint<N>) out,
__out (ch_uint1) cout
);
void describe() {
auto sum = ch_pad<1>(io.lhs) + io.rhs + io.cin;
io.out = ch_slice<N>(sum);
io.cout = sum[N];
}
};
Sequential objects in Cash are defined using generic objects ch_reg<T> and ch_mem<T,N> to declare register and memory objects respectively. The default clock and reset signals are declared implicitly by the compiler. By default, sequential objects are updated on the rising edge of the default clock and the reset is synchronous. Cash uses the ’next-value’ semantic to specify the next state of a register object. The folowing listing shows a Cash implementation of a generic FIFO, configurable by providing the enclosed element data type T and depth N. Register variables ‘rp’ and ‘wp’ are assigned their next value via ‘rp->next’ or ‘wp->next’ member, respectively.
template <typename T, unsigned N>
class Fifo {
__io (
(enq_io<T>) enq,
(ch_flip_io<enq_io<T>>) deq
);
static int A = log2ceil(N);
void describe() {
ch_mem<T, N> ram;
ch_reg<ch_uint<A+1>> rp(0), wp(0);
auto r = io.deq.ready && io.deq.valid;
auto w = io.enq.valid && io.enq.ready;
auto ra = ch_slice<A>(rp);
auto wa = ch_slice<A>(wp);
rp->next = ch_sel(r, rp + 1, rp);
__if (w) {
ram[wa]->next = io.enq.bits;
wp->next = wp + 1;
};
io.deq.bits = mem[ra];
io.deq.valid = wp != rp;
io.enq.ready = wa != ra||wp[A]==rp[A];
}
};
There are three preferred ways of using registers in Cash: 1) ch_next(obj, init); ch_nextEn(obj, enable, init);
ch_bool x;
auto y = ch_next(x, false);
2) ch_delay(obj, delay, init); ch_delayEn(obj, delay, enable, init);
ch_bool x;
auto y = ch_delay(x, 4, 0); // 4 cycles shift registers
3) ch_reg
ch_reg<ch_bool> x(0);
x->next = x + 1;
Use option 1) as much as possible for simplicity if you need a one-cycle latch.
Use option 2) if you need to delay the signal for multiple cycles.
Use option 3) if you need a more complex logic for the register.
The Cash DSL supports aggregate types including enums, structs, and unions, defined using __enum(), __struct(), and __union () declarations, respectively. Static vectors are defined using ch_vec<T, N> declaration where T is the enclosed data type and N the number of entries in the container. Composition, inheritance, and templates are also supported on user-defined types to enable the full power of abstraction. The following listing shows a definition of an enum ‘FlitType’, a generic union ‘FlitData’, a struct ‘Flit’, and a vector ‘Flits’.
__enum (FlitType, (
Invalid,
Valid
));
template <unsigned N>
__union (FlitData, (
(ch_int<N>) vi,
(ch_float) vf
));
template <unsigned N>
__struct (Flit, (
(FlitType) type,
(FlitData<N>) data
));
template <unsigned N>
using Flits = ch_vec<Flit<N>, 16>;
Name | Description | DataTypes | Category |
Equal | == | primary types | Equality |
Not Equal | != | primary types | |
Less | < | signed/unsigned types | Relational |
Less or Equal | <= | signed/unsigned types | |
Greater | > | signed/unsigned types | |
Greater or Equal | >= | signed/unsigned types | |
Not | ! | primary types | Logical |
And | && | primary types | |
Or | || | primary types | |
Inverse | ~, ch_inv | primary types | Binary |
And | &, ch_and | primary types | |
Or | |, ch_or | primary types | |
Xor | ^, ch_xor | primary types | |
Reduce And | ch_andr | primary types | Reduce |
Reduce Or | ch_orr | primary types | |
Reduce Xor | ch_xorr | primary types | |
Shift Left | <<, ch_shl | primary types | Shift |
Shift Right | >>, ch_shr | primary types | |
Rotate Left | ch_rotl | primary types | Rotate |
Rotate Right | ch_rotr | primary types | |
Neg | -, ch_neg | signed/unsigned types | Arithmetic |
Addition | +, ch_add | signed/unsigned types | |
Subtraction Add | -, ch_sub | signed/unsigned types | |
Multiplication | *, ch_mult | signed/unsigned types | |
Division | /, ch_div | signed/unsigned types | |
Modulus | %, ch_mod | signed/unsigned types | |
Bit Select | [] | primary types | Subscript |
Slicing | ch_slice | primary types | |
Ternary | ch_sel | primary types | Conditionals |
Multi-Selection | ch_case | primary types | |
Minimun | ch_min | primary types | |
Maximum | ch_max | primary types | |
Padding | ch_pad | primary types | Resizing |
Resizing | ch_resize | all types | |
Concatenation | ch_cat | all types | |
Replication | ch_dup | all types | |
Bit Shuffling | ch_shuffle | all types | Permutations |
Reinterpret Cast | ch_as | all types | Cast |
Register Cast | as_reg | all types | |
Clone | ch_clone | all types | References |
Reference | ch_ref | all types | |
Slice Reference | ch_sliceref | all types | |
Aligned Slice Reference | ch_asliceref | all types | |
Group Assigment | ch_tie | all types | |
Map | ch_map | all types | Higher-Order |
Fold | ch_fold | all types | |
Zip | ch_zip | all types | |
Scan | ch_scan | all types | |
Single Latch | ch_next | all types | Buffers |
Single Latch w/ enable | ch_nextEn | all types | |
Delay Buffer | ch_delay | all types | |
Delay Buffer w/ enable | ch_delayEn | all types | |
Current Clock | ch_clock | all types | Clock Domain |
Current Reset | ch_reset | all types | |
Push Clock | ch_pushcd | all types | |
Pop Clock | ch_popcd | all types | |
Clock Region | ch_cd | all types | |
ch_print | all types | Debugging | |
Print NewLine | ch_println | all types | |
Assertion | ch_assert | all types | |
Signal Tapping | ch_tap | all types | |
Current Time | ch_now | all types |
Cash implements combinational circuits via C++ operators. Operators are natively supported on the primary and extended data types, but they are also accessible via inheritance on derived I/O and sequential types. The above table presents a classification of most of the operators defined in the DSL. When the bit width of the source operands do not match, the DSL will zero-extend them or sign-extend them depending on their sign. The output bit width is inferred automatically from the operands’ size and the type of the operation. The DSL also provides a function-based API for combinational circuit to supplement existing operators or adding support for operators that are not natively supported in C++ sucb as rotation or bit slicing for instance.
To cast a variable from one type to the other, static-cast is supported on all primary and extended types using the native C++ static-cast operator. To perform reinterpret-cast, primary and extended types implements a generic method as<U>() for reinterpreting the bits of a variable as a new type U. The following code snippet illustrates the various uses of the cast operators.
ch_int4 obj1 = 0x1;
auto obj2 = static_cast<ch_int8>(obj1); // static cast
auto obj3 = obj1.as<ch_uint4>(); // reinterpret cast
auto obj4 = obj1.as_uint(); // short form of previous line
Data type Assignments inCash are by reference like in JAVA, this expands supportfor a wider range of design patterns exploiting objectsreassignment. The DSL provides two utilities operatorsclone()andref()to copy or create a pointer to a variable,respectively, during assignments. The following codesnippet illustrates various uses of the instance operators.
ch_uint4 a = 0x0, b = a, c = a.clone(), d = a.ref();
a = 0x1; // only b and d are modified
b = 0x2; // only b is modified
c = 0x3; // only c is modified
d = 0x4; // a is also modified
There are two types of control flow support in Cash: static control flow and dynamic control flow.
Static control flows are control flow operations that can be constructed at compile time. Cash implements static control flow usingcombinational MUX circuits. The DSL provides utility functions ch_sel() and ch_case() for describing static control flow as dataflow operators. The following code snippet shows some sample usages of the static control flow operators.
ch_int4 a = 0x1, b = 0x2, c = 0x3;
// x = (a == 0) ? b : ((a == 1) ? c : 0);
auto x1 = ch_sel(a == 0, b, ch_sel(a == 1, c, 0));
auto x2 = ch_sel(a == 0, b)(a == 1, c)(0); // short form
auto x3 = ch_case(a, 0, b)(1, c)(0); // key-value form
The DSL also extend the C++ control flow statement using __if, __elif __else attributes to represent more complex static control flow blocks. This feature enables nested control blocks as well as local variables to be used (see listing in the next section).
Dynamic control flow in hardware is implemented using finite state machines (FSM). FSMs are implemented in Cash using enumeration types and registers for state transitions. The following code snippet implements the body of an FSM with three transition states ‘State::idle’, ‘State::run’, and ‘State::done’
__enum (State, (idle, run, done));
void describe() {
ch_reg<State> state(State::idle);
__switch (state)
__case (State::idle) {
__if (io.valid) {
__if (io.count == 0) {
state->next = State::done;
}__else {
state->next = State::run;
};
};
}
__case (State::run) {
state->next = State::done;
}
__case (State::done) {
state->next = State::idle;
};
}
The Cash framework supports a port interface structures for delaring an interface as a type outside the module. This feature allows definition of interfaces to share between hardware modules inside a project. Interfaces can be nested as a member of another interface. It is also possible to implement inheritance with interfaces. You declare an interface using the __interface () construct inside which you place all your I/O ports.
One of the advantages of using interfaces is for automatic binding or bulk connection where to connect two interfaces you don’t have to explicitly connect each port, but simply bind the interface directly and let the compiler infer the correct connection betwen the nested ports. To bind two interfaces, you use the C++ call operator().
The following example defines three interfaces link_io, plink_io which derives from link_io, and filter_io which uses plink_io as nested member. The binding of these interfaces is illustrated inside module FilterBlock when connecting sub-module Filter instance f1_, and f2_.
template <typename T>
__interface (link_io, (
__out (T) data,
__out (ch_bool) valid
));
template <typename T>
__interface (plink_io, link_io<T>, ( // using inheritance
__out (ch_bool) parity
));
template <typename T>
__interface (filter_io, (
(plink_io<T>) x, // nesting interfaces
(ch_flip_io<plink_io<T>>) y // using flipped interfaces
));
template <typename T>
struct Filter {
filter_io<T> io;
void describe() {
auto tmp = (ch_pad<1>(io.x.data) << 1)
| ch_pad<1>(io.x.parity);
io.y.data = ch_delay(ch_slice<T>(tmp), 1, 0);
io.y.parity = ch_delay(io.x.data[ch_width_v<T>-1], 1, 0);
io.y.valid = ch_delay(io.x.valid, 1, 0);
}
};
template <typename T>
struct FilterBlock {
filter_io<T> io;
void describe() {
f1_.io.x(io.x); // binding interfaces
f1_.io.y(f2_.io.x);
f2_.io.y(io.y);
}
ch_module<Filter<T>> f1_, f2_;
};
The Cash DSL supports using defining custom clock and reset signals via clock domains. The DSL provides a stack-based interface for modifying the current clock domain. This is done using ch_pushcd(clock, reset, posedge) and ch_popcd() built-in functions. The following example illustrates the use of clock domains with two user-defined clocks and a user-defined reset signal. The generated verilog program and VCD trace is also shown.
#include <cash/core.h>
#include <assert.h>
#include <iostream>
using namespace ch::core;
#define P_CLK1 1
#define P_CLK2 2
#define P_START (std::max(P_CLK1, P_CLK2) * 2)
#define P_STOP (P_START * 5)
template <unsigned N>
struct MyModule {
__io (
__in (ch_bool) clk1,
__in (ch_bool) clk2,
__in (ch_bool) reset,
__in (ch_uint<N>) din,
__out (ch_uint<N>) dout
);
void describe() {
// posedge clk1, negedge reset
ch_pushcd(io.clk1, ~io.reset, true);
ch_reg<ch_uint<N>> x(io.din);
ch_popcd();
// negedge clk2, negedge reset
ch_pushcd(io.clk2, ~io.reset, false);
ch_reg<ch_uint<N>> y(io.din);
ch_popcd();
// logic
x->next = x + 1;
y->next = y + 1;
io.dout = x + y;
// add local variables to debug trace
__tap(x);
__tap(y);
}
};
int main() {
ch_device<MyModule<8>> my_module;
ch_tracer tracer(my_module);
tracer.run([&](ch_tick t)->bool {
switch (t) {
case 0:
my_module.io.clk1 = 0;
my_module.io.clk2 = 1;
my_module.io.reset = 0;
my_module.io.din = 7;
break;
default:
if (0 == (t % P_CLK1)) my_module.io.clk1 = !my_module.io.clk1;
if (0 == (t % P_CLK2)) my_module.io.clk2 = !my_module.io.clk2;
if (t >= P_START) my_module.io.reset = 1;
break;
}
return (t <= P_STOP);
});
std::cout << "result = " << my_module.io.dout << std::endl;
ch_toVerilog("my_module.v", my_module);
tracer.toVCD("my_module.vcd");
return 0;
}
module MyModule(
input wire io_clk1,
input wire io_clk2,
input wire io_reset,
input wire[7:0] io_din,
output wire[7:0] io_dout
);
reg[7:0] x_11, y_16;
wire _inv_7;
wire[7:0] _add_19, _add_21, _add_22, x, y;
always @ (posedge io_clk1) begin
if (_inv_7)
x_11 <= io_din;
else
x_11 <= _add_19;
end
always @ (negedge io_clk2) begin
if (_inv_7)
y_16 <= io_din;
else
y_16 <= _add_21;
end
assign _inv_7 = ~io_reset;
assign _add_19 = x_11 + 8'h1;
assign _add_21 = y_16 + 8'h1;
assign _add_22 = x_11 + y_16;
assign x = x_11;
assign y = y_16;
assign io_dout = _add_22;
endmodule
The Cash extension API is mainly driven via User-Defined Functions (UDF). This interface allows programmers to extend the base API functionalities by defining their own functions to integrate with the rest of the framework. This facility is particularly important for two scenarios:
When prototyping new hardware and we are only interested in the cycle-level or functional modeling of sub-components.
When importing existing IPs given their functional implementation written in pure C++ or other frameworks like System C or even an existing HDL component written Verilog.
The DSL extension namespace implements two generic interfaces ch_udf_comb<T> and ch_udf_seq<T> for instantiating combinational or sequential user-defined functions respectively. A user-defined function is implemented using a struct or class, similar to how modules are described in Cash. They should provide an eval() method instead of the describe() through which the user will implement the desired combinational or a functional model of their component. Functional modeling is done using the combinational ch_udf_comb<T> interface. Cycle-level modeling is done using the sequential ch_udf_seq<T> interface. Internally, the Cash simulator will call into the specified extension based on its execution model.
The listing below is a simple example that implements a functional design for an integer division extension ‘MyDiv’ using user-defined functions. The I/O ports ‘lhs’, ‘rhs’, and ‘dst’ are system space ports directly accessible by the host application. The eval() method implements the functional model for integer division using C++ directly. The example also includes an ALU test module to illustrate how to instantiate and use the user-defined function inside a Cash module using the ch_udf_comb<T> interface.
struct MyDiv {
__io (
__in (ch_int32) lhs,
__in (ch_int32) rhs,
__out (ch_int32) dst,
);
void eval() {
io.dst = io.lhs / io.rhs;
}
void from_verilog(std::ostream& o) {
o << "assign $io.dst = $io.lhs / $io.rhs;";
}
};
struct ALU {
__io (
__in (ch_int32) a,
__in (ch_int32) b,
__out (ch_int32) c,
);
void describe() {
ch_udf_comb<MyDiv> div;
div.io.lhs = io.a;
div.io.rhs = io.b;
io.c = div.io.dst;
}
};
The listing below is a simple example that implements a cycle-level design of a custom AES encryption accelerator via user-defined functions. The extension internally uses an existing cycle-level AES simulator which implements a tick() method for advancing its internal states. The ‘ch_udf_seq’ Cash interface is used in this case to tell the compiler that this extension supports sequential execution and should be invoked by the simulator on a per-clock cycles basis.
struct AES {
__io (
__in (ch_int128) plaintext,
__in (ch_int128) key,
__out (ch_int128) ciphertext,
);
void eval() {
io.dst = sim_.output();
sim_.input(io.plaintext);
sim_.key(io.key);
sim_.tick(); // advance clock
}
AES_CAS_Simulator sim_;
};
struct SoC {
__io (
__in (ch_int128) a,
__in (ch_int128) b,
__out (ch_int128) c,
);
void describe() {
ch_udf_seq<AES> aes;
aes.io.plaintext = io.a;
aes.io.key = io.b;
io.c = div.io.ciphertext;
}
};
User-defined functions also allow existing Verilog code to be provided as part of the extension description. This is done via the from_verilog() method that the user should implement to provide the code for the Verilog components. In our integer division code in the above listing, the from_verilog() method demonstrates how a custom Verilog code could also be included to perform the same division. It is also possible to simply provide the path to a Verilog program file and have the framework load it directly.
When an eval() method is also provided, the Cash compiler invokes the provided function during simulation and uses the Verilog code only during codegen, merging it with the rest of the generated Verilog program. When no eval() method is provided, Cash automatically generates the simulation stub to execute the Verilog code. The compiler internally uses Verilog VPI to communicate with the external Verilog modules using any user-provided Verilog simulator.
The Cash hardware template library (HTL) is a repository of generic reusable components that are provided to construct hardware blocks in a standardized and efficient manner to boost productivity. The HTL currently includes hardware queues, arbiters, crossbars, counters, encoders, decoders, pipe registers, muxes, fixed-point, floating-point, complex numbers. The HTL objects are defined under the ‘ch::htl’ namespace and are added to the project by including their header file. Our MatMul QuickStart example illustrates the use of the ch_counter object from the HTL.
The Cash DSL exposes an interface to the high-speed simulator via the ch_simulator object to give developers fine-grain-control over the simulation execution. The interface implements three relevant functions:
There is also a tracer object ch_tracer which extends from ch_simulator to provide tracing capabilities to the simulator. The ch_tracer object implements the following functions to generate various traces for debugging:
There are three ways of invoking the Cash simulator:
1) Single-run mode: when the input values do not need to change during the simulation.
int main() {
ch_device<MyModule<ch_bit2, 2>> my_device;
ch_simulator simulator(my_device);
my_device.io.din = 1; // assign inputs
my_device.io.push = 1;
simulator.run(20); // run the simulation for 20 cycles
assert(my_device.io.full == true); // check outputs
return 0;
}
2) Callback mode: when the input values have to change during the simulation or when you need to check your output at a specific time. You use a lambda callback function to intercept the simulator and apply your changes before every simulation step.
int main() {
ch_device<MyModule<ch_bit2, 2>> my_device;
ch_simulator simulator(my_device);
simulator.run([&](ch_tick t)->bool {
switch (t) {
case 0:
my_device.io.din = 1; // assign inputs
my_device.io.push = 1;
break;
case 2:
assert(my_device.io.full == false); // check outputs
my_device.io.din = 2;
my_device.io.push = 1;
break;
case 4:
assert(my_device.io.full == true); // check outputs
break;
}
return (t <= 4);
});
return 0;
}
3) Stepping mode: when the input values have to change during the simulation or when you need to check your output at a specific time. You can directly invoke the simulation steps.
int main() {
ch_device<MyModule<ch_bit2, 2>> my_device;
ch_simulator simulator(my_device);
simulator.reset(); // invoke clock reset sequence
my_device.io.din = 1; // assign inputs
my_device.io.push = 1;
simulator.step(2); // advance one cycle (2 ticks)
my_device.io.din = 2;
my_device.io.push = 1;
simulator.step(2); // advance one cycle (2 ticks)
assert(my_device.io.full == true); // check outputs
return 0;
}
The Cash DSL provides the following diagnostic APIs to verify the hardware design at runtime:
Cash projects can also leverage existing C++ unit test framework like Google Test, Boost Test, or Catch for large-scale projects.
The step() function of the simulator object is the preferred choice when simulating a Cash model inside in CAS-based architecture simulators like GEM5 and SST. In GEM5, this can be done via the processEvent() of SimObject objects. In SST and Manifold, it is done by handling registered clock event callbacks on Component objects. The following listing illustrates the implementation of a simple SST component simulating a Cash model.
struct MyComponent : public Component {
MyComponent(...) {
ch_device<Adder<4>> my_adder;
my_sim = std::make_shared<ch_simulator>(my_adder);
auto clk = get_current_clock();
registerClock(clk, &MyComponent::tick);
}
void tick() {
my_sim->step(); // invoke Cash simulator
}
std::shared_ptr<ch_simulator> my_sim;
};
The Cash DSL cuurently provides two main functions for exporting HDL code:
Our MatMul QuickStart example illustrates the use of the ch_toVerilog function to generate Verilog HDL.
High-Level Synthesis (HLS) Tools like Intel Quartus support a compiler that can convert OpenCL programs to RTL. The compiler provides an extension API for referencing custom RTL components inside OpenCL kernels and using them as external libraries during the synthesis flow. This interface can be used to optimize OpenCL kernels by providing fine-tuned RTL implementation ofsome components. Another application is for profiling existing RTL models since Quartus already provides the application software and system components to support the FPGA. RTL libraries in Quartus interact with the OpenCL kernel via the Avalon Bus Interface.
The Cash library implements a generic implementation of the Avalon interface and utility functions to provide a productive development environment for designing OpenCL RTL libraries. The libray also implements an Avalon bus extension for the Cash simulator that enables the simulation of the OpenCL RTL libraries with the Cash environment. The same extension also provides support for simulation of RTL libraries inside the OpenCL emulator.
The Cash HTL implements the following objects for HLS integration:
The following example illustrates a Sobel filter OpenCL Wrapper interface ‘sobel_ocl’ uses the avm_reader and avm_writer interfaces. You may find an implementation of the Sobel filter in ‘examples’ folder in the Cash’s source repository. The simulation code in the ‘main()’ function shows how the avm_slave_driver interface is used to simulate the Avalon bus interface.
#include <sobel.h>
#include <cash/eda/altera/avalon.h>
using namespace eda::altera::avalon;
template <typename T>
class sobel_ocl {
public:
__io (
__in (ch_uint<avm_v0::AddrW>) dst,
__in (ch_uint<avm_v0::AddrW>) src,
__in (ch_uint32) count,
(avalon_st_io) avs,
(avalon_mm_io<avm_v0>) avm_dst,
(avalon_mm_io<avm_v0>) avm_src
);
__enum (ctrl_state, (idle, running, done));
sobel_ocl(uint32_t width, uint32_t height, uint32_t pipelen)
: core_(width, height, pipelen)
{}
void describe() {
ch_reg<ctrl_state> state(ctrl_state::idle);
__switch (state)
__case (ctrl_state::idle) {
__if (io.avs.valid_in) {
state->next = ctrl_state::running;
};
}
__case (ctrl_state::running) {
__if (core_.io.done) {
state->next = ctrl_state::done;
};
}
__case (ctrl_state::done) {
__if (!avm_writer_.io.busy && io.avs.ready_in) {
state->next = ctrl_state::idle;
};
};
auto start = io.avs.valid_in && io.avs.ready_out;
io.avs.ready_out = (state == ctrl_state::idle);
io.avs.valid_out = (state == ctrl_state::done) && !avm_writer_.io.busy;
avm_reader_.io.base_addr = io.src;
avm_reader_.io.start = start;
avm_reader_.io.count = io.count;
avm_reader_.io.avm(io.avm_src);
avm_reader_.io.deq(core_.io.in);
avm_writer_.io.base_addr = io.dst;
avm_writer_.io.start = start;
avm_writer_.io.done = (state == ctrl_state::done);
avm_writer_.io.avm(io.avm_dst);
avm_writer_.io.enq(core_.io.out);
}
private:
ch_module<sobel_core<T>> core_;
ch_module<avm_reader<T>> avm_reader_;
ch_module<avm_writer<T>> avm_writer_;
};
int main() {
uint32_t width, height;
std::vector<uint8_t> src_image;
if (!readImage(src_image_file, &width, &height, &src_image))
return -1;
auto num_pixels = width * height;
auto num_blocks = ceildiv(num_pixels, 64);
src_image.resize(num_blocks * 64);
auto dst_image_size = num_blocks * 64;
std::vector<uint8_t> dst_image(dst_image_size, 0);
ch_device<sobel_ocl<ch_uint8>> device(width, height, 4);
// setup Avalon salve driver
avm_slave_driver<avm_v0> avm_driver(2, 128, 84);
avm_driver.bind(0, device.io.avm_src, src_image.data(), src_image.size());
avm_driver.bind(1, device.io.avm_dst, dst_image.data(), dst_image.size());
// run simulation
ch_tracer tracer(device);
device.io.avs.valid_in = false;
device.io.avs.ready_in = false;
auto ticks = tracer.run([&](ch_tick t)->bool {
if (2 == t) {
// start simulation
device.io.dst = 0;
device.io.src = 0;
device.io.count = num_pixels;
device.io.avs.valid_in = true;
device.io.avs.ready_in = true;
}
// tick avm driver
avm_driver.tick();
// stop simulation when done
return !device.io.avs.valid_out;
}, 2);
// flush pending requests
avm_driver.flush();
std::cout << "Simulation run time: " << std::dec << ticks/2 << " cycles" << std::endl;
ch_toVerilog("sobel_ocl.v", device);
tracer.toVCD("sobel_ocl.vcd");
return 0;
}
The Cash source repository includes the following examples.
You can execute any example manually using he following command:
$ cd build/examples
$ ../bin/adder
The Cash source repository includes a unit-test suite for validating the DSL, compiler, and codegen functionalities. The Test suite is currently integrated with Travis Constant Intergration and Codecov code coverage frameworks.
You can execute the unit-test manually using he following command:
$ cd build/tests
$ ../bin/testsuite
Contributions to this codebase are welcome, please email me at blaise.tine@gmail.com.
Release under the BSD license, see LICENSE for details.